FIGURE 1 



AGTCCCAGACGGGCTTTTCCCAGAGAGCTAAAAGAGAAGGGCCAGAGAATGTCGTCCCAG 
5 CCAGCAGGGAACCAGACCTCCCCCGGGGCCACAGAGGACTACTCCTATGGCAGCTGGTAC 
ATCGATGAGCCCCAGGGGGGCGAGGAGCTCCAGCCAGAGGGGGAAGTGCCCTCCTGCCAC 
ACCAGCATACCACCCGGCCTGTACCACGCCTGCCTGGCCTCGCTGTCAATCCTTGTGCTG 
CTGCTCCTGGCCATGCTGGTGAGGCGCCGCCAGCTCTGGCCTGACTGTGTGCGTGGCAGG 
CCCGGCCTGCCCAGCCCTGTGGATTTCTTGGCTGGGGACAGGCCCCGGGCAGTGCCTGCT 

1 0 GCTGTTITCATGGTCCTCCTGAGCTCCCTGTGTTTGCTGCTCCCCGACGAGGACGCATTG 
CCCTTCCTGACTCTCGCCTCAGCACCCAGCCAAGATGGGAAAACTGAGGCTCCAAGAGGG 
GCCTGGAAGATACTGGGACTGTTCTATTATGCTGCCCTCTACTACCCTCTGGCTGCCTGT 
GCCACGGCTGGCCACACAGCTGCACACCTGCTCGGCAGCACGCTGTCCTGGGCCCACCTT 
GGGGTCCAGGTCTGGCAGAGGGCAGAGTGTCCCCAGGTGCCCAAGATCTACAAGTACTAC 

15 TCCCTGCTGGCCTCCCTGCCTCTCCTGCTGGGCCTCGGATTCCTGAGCCTTTGGTACCCT 

GTGCAGCTGGTGAGAAGCTTCAGCCGTAGGACAGGAGCAGGCTCCAAGGGGCTGCAGAGC 
AGCTACTCTGAGGAATATCTGAGGAACCTCCTTTGCAGGAAGAAGCTGGGAAGCAGCTAC 
CACACCTCCAAGCATGGCTTCCTGTCCTGGGCCCGCGTCTGCTTGAGACACTGCATCTAC 
ACTCCACAGCCAGGATTCCATCTCCCGCTGAAGCTGGTGCTTTCAGCTACACTGACAGGG 

20 ACGGCCATTTACCAGGTGGCCCTGCTGCTGCTGGTGGGCGTGGTACCCACTATCCAGAAG 
GTGAGGGCAGGGGTCACCACGGATGTCTCCTACCTGCTGGCCGGCTTTGGAATCGTGCTC 
TCCGAGGACAAGCAGGAGGTGGTGGAGCTGGTGAAGCACCATCTGTGGGCTCTGGAAGTG 
TGCTACATCTCAGCCTTGGTCTTGTCCTGCTTACTCACCTTCCTGGTCCTGATGCGCTCA 
CTGGTGACACACAGGACCAACCTTCGAGCTCTGCACCGAGGAGCTGCCCTGGACTTGAGT 

25 CCCTTGCATCGGAGTCCCCATCCCTCCCGCCAAGCCATATTCTGTTGGATGAGCTTCAGT 
GCCTACCAGACAGCCTTTATCTGCCTTGGGCTCCTGGTGCAGCAGATCATCTTCTTCCTG 
GGAACCACGGCCCTGGCCTTCCTGGTGCTCATGCCTGTGCTCCATGGCAGGAACCTCCTG 
CTCTTCCGTTCCCTGGAGTCCTCGTGGCCCTTCTGGCTGACTTTGGCCCTGGCTGTGATC 
CTGCAGAACATGGCAGCCCATTGGGTCTTCCTGGAGACTCATGATGGACACCCACAGCTG 

30 ACCAACCGGCGAGTGCTCTATGCAGCCACCTTTCTTCTCTTCCCCCTCAATGTGCTGGTG 
GGTGCCATGGTGGCCACCTGGCGAGTGCTCCTCTCTGCCCTCTACAACGCCATCCACCTT 
GGCCAGATGGACCTCAGCCTGCTGCCACCGAGAGCCGCCACTCTCGACCCCGGCTACTAC 
ACGTACCGAAACTTCTTGAAGATTGAAGTCAGCCAGTCGCATCCAGCCATGACAGCCTTC 
TGCTCCCTGCTCCTGCAAGCGCAGAGCCTCCTACCCAGGACCATGGCAGCCCCCCAGGAC 

35 AGCCTCAGACCAGGGGAGGAAGACGAAGGGATGCAGCTGCTACAGACAAAGGACTCCATG 
GCCAAGGGAGCTAGGCCCGGGGCCAGCCGCGGCAGGGCTCGCTGGGGTCTGGCCTACACG 
CTGCTGCACAACCCAACCCTGCAGGTCTTCCGCAAGACGGCCCTGTTGGGTGCCAATGGT 
GCCCAGCCC TGA GGGCAGGGAAGGTCAACCCACCTGCCCATCTGTGCTGAGGCATGTTCC 
TGCCTACCATCCTCCTCCCTCCCCGGCTCTCCTCCCAGCATCACACCAGCCATGCAGCCA 

40 GCAGGTCCTCCGGATCACTGTGGTTGGGTGGAGGTCTGTCTGCACTGGGAGCCTCAGGAG 
GGCTCTGCTCCACCCACTTGGCTATGGGAGAGCCAGCAGGGGTTCTGGAGAAAAAAACTG 
GTGGGTTAGGGCCTTGGTCCAGGAGCCAGTTGAGCCAGGGCAGCCACATCCAGGCGTCTC 
CCTACCCTGGCTCTGCCATCAGCCTTGAAGGGCCTCGATGAAGCCTTCTCTGGAACCACT 
CCAGCCCAGCTCCACCTCAGCCTTGGCCTTCACGCTGTGGAAGCAGCCAAGGCACTTCCT 

45 CACCCCCTCAGCGCCACGGACCTCTCTGGGGAGTGGCCGGAAAGCTCCCGGTCCTCTGGC 
CTGCAGGGCAGCCCAAGTCATGACTCAGACCAGGTCCCACACTGAGCTGCCCACACTCGA 
GAGCCAGATATTTTTGTAGTTTTTATGCCTTTGGCTATTATGAAA 
CCTGCAATAAACTTGTTCCTGAGAAAAAAAAAAAAAAAAAA 
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 
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FIGURE 2 

MSSQPAGNQTSPGATEDYSYGSWY1DEPQGGEELQPEGEVPSCHTSIPPGLYHACLASLS 
ILVLLLLAMLWRRQLWPDCWGRPGLPSPVDFI^GDRPRA 

EDALPFLTLASAPSQDGKTEAPRGAWKILGLFYYAALYYPLAACATAGHTAAHLLGSTLS 
WAHLGVQVWQRAECPQWKJYKYYSLLASLPLLLGLGFLSLWYPVQLVRSFSRRTGAGSK 
GLQSSYSEEYLRNLLCRKKLGSSYHTSKHGFLSWARVCLRHCIYTPQPGFHLPLKLVLSA 
TLTGTAIYQVALLLLVG\^T1QKV^^ 

ALEVCY1SALVLSCLLTFLVLMRSLVTHRTOLRALHRGAALDLSPLHRSPHPSRQAIFCW 
MSFSAYQTAFICLGLLVQQIIFFLGTTALAFLVLMPVLHGRNLLLFRSLESSWPFWLTLA 
LAVILQNMAAHWWLETODGHPQLTNRRVL 

AIHLGQMDLSLLPPRAATLDPGYYTYRNFLKIEVSQSHPAMTAFCSLLLQAQSLLPRTMA 
APQDSLRPGEEDEGMQLLQTKDSMAKGARPGASRGRARWGLAYTLLHNPTLQVFRKTALL 

GANGAQP 

Important features of the protein: 
Signal peptide: 

None 

Transmembrane domain: 

54-69 

102-119 

148-166 

207-222 

301-320 

364-380 

431-451 

474-489 

560-535 

Motif file: 

Motif name: N-glycosylation site. 
8-12 

Motif name: N-myristoylation site. 

50-56 

176-182 

241-247 

317-323 

341-347 

525-531 

627-633 

631-637 

640-646 

661-667 

Motif name: Prokaryotic membrane lipoprotein lipid attachment site. 
364-375 

Motif name: ATP/GTP-binding site motif A (P-loop). 



132-140 
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PRO XXXXXXXXXXXXXXX (Length = 15 amino acids) 

Comparison Protein XXXXXYYYYYYY (Length = 12 amino acids) 

5 

% amino acid sequence identity = 



(the number of identically matching amino acid residues between the two polypeptide 
sequences as determined by ALIGN-2) divided by (the total number of amino acid 
10 residues of the PRO polypeptide) = 



5 divided by 15 = 333% 



FIGURE 3B 



PRO XXXXXXXXXX (Length = 10 amino acids) 

Comparison Protein XXXXXYYYYYYZZYZ (Length = 15 amino acids) 

5 

% amino acid sequence identity = 



(the number of identically matching amino acid residues between the two polypeptide 
sequences as determined by ALIGN-2) divided by (the total number of amino acid 
10 residues of the PRO polypeptide) = 



5 divided by 10 = 50% 



FIGURE 3C 



PRO-DNA NNNNNNNNNNNNNN (Length = 14 

nucleotides) 

5 Comparison DNA NNNNNNLLLLLLLLLL (Length = 16 

nucleotides) 

% nucleic acid sequence identity = 

10 (the number of identically matching nucleotides between the two nucleic acid sequences 
as determined by ALIGN-2) divided by (the total number of nucleotides of the PRO- 
DNA nucleic acid sequence) = 

6 divided by 14 = 42.9% 
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FIGURE 3D 



PRO-DNA NNNNNNNNNNNN (Length = 12 nucleotides) 

Comparison DNA NNNNLLLVV (Length = 9 

5 nucleotides) 

% nucleic acid sequence identity = 

(the number of identically matching nucleotides between the two nucleic acid sequences 
10 as determined by ALIGN-2) divided by (the total number of nucleotides of the PRO- 
DNA nucleic acid sequence) = 



4 divided by 12 = 33.3% 
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* C-C increased from 12 to 15 

* Z is average of EQ 

* B is average of ND 

* match with stop is_M; stop-stop = 0; J (joker) match = 0 
*/ 

#define M -8 /* value of a match with a stop */ 



int 

/* 
/* A */ 
/*B*/ 
/*C*/ 
/*D*/ 

/*£*/ 

/*F*/ 

l*G*i 

/*H*/ 

1*1 */ 

/* J */ 

/*K*/ 

/* L */ 

/*M*/ 

/*N*/ 

1*0*1 

0,_M,_M, 

/* P */ 

/* Q */ 

/* R */ 

/*S*/ 

/*T*/ 

/*U*/ 

/* v */ 

/* W */ 
/*X */ 
/* Y*/ 

1*1*1 

}; 



_day[26][26j = { 
ABCDEFGHIJ 



KLMNOPQRSTUVWXYZ*/ 



2, 0,-2, 0, 0,-4, 1,-1,-1, 0,-1,-2,-1, 0,_M, 1, 0,-2, 1, 1, 0, 0,-6, 0,-3, 0}, 
0, 3,-4, 3, 2,-5, 0, 1,-2, 0, 0,-3,-2, 2,_M,-1, 1, 0, 0, 0, 0,-2,-5, 0,-3, 1}, 
-2,-4,15,-5,-5,-4,-3,-3,-2, 0,-5,-6,-5, -4 ,_M,-3,-5,-4, 0,-2, 0,-2,-8, 0, 0,-5}, 
0, 3,-5, 4, 3,-6, 1, 1,-2, 0, 0,-4,-3, 2,_M,-1, 2,-1, 0, 0, 0,-2,-7, 0,-4, 2}, 

0, 2,-5, 3, 4,-5, 0, 1,-2, 0, 0,-3,-2, 1,_M,-1, 2,-1, 0, 0, 0,-2,-7, 0,-4, 3}, 
-4,-5,-4,-6,-5, 9,-5,-2, 1, 0,-5, 2, 0,-4,_M,-5,-5,-4,-3,-3, 0,-1, 0, 0, 7,-5}, 

1, 0,-3, 1, 0,-5, 5,-2,-3, 0,-2,-4,-3, 0,_M,-l,-l,-3, 1,0, 0,-1,-7, 0,-5, 0}, 
1, 1,-3, 1, 1,-2,-2, 6,-2, 0, 0,-2,-2, 2,_M, 0, 3, 2,-1,-1, 0,-2,-3, 0, 0, 2}, 

-1,-2,-2,-2,-2, 1,-3,-2, 5, 0,-2, 2, 2,-2,_M,-2,-2,-2,-l, 0, 0, 4,-5, 0,-1,-2}, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,_M, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, 
-1, 0,-5, 0, 0,-5,-2, 0,-2, 0, 5,-3, 0, 1,_M,-1, 1, 3, 0, 0, 0,-2,-3, 0,-4, 0}, 
-2,-3,-6,-4,-3, 2,-4,-2, 2, 0,-3, 6, 4,-3 ,_M ,-3,-2,-3,-3,-1, 0, 2,-2, 0,-1,-2}, 
-1,-2,-5,-3,-2, 0,-3,-2, 2, 0, 0, 4, 6,-2,_M,-2,-l, 0,-2,-1, 0, 2,-4, 0,-2,-1}, 

0, 2,-4, 2, 1,-4, 0, 2,-2, 0, 1,-3,-2, 2,_M,-1, 1, 0, 1, 0, 0,-2,-4, 0,-2, 1}, 

{_M t _M,_M,_M,_M,_M,_M,_M,_M,_M,_M,_M,_M, 

M ,_M ,_M ,_M ,_M ,_M ,_M ,_M} , 

1, -1,-3,-1,-1,-5,-1, 0,-2, 0,-1,-3,-2,-1 ,Jvi, 6, 0, 0, 1, 0, 0,-1,-6, 0,-5, 0}, 
o[ 1,-5,2,2,-5,-1, 3,-2, 0, 1,-2,-1, 1,_M,0, 4, 1,-1,-1,0,-2,-5,0,^, 3}, 
-2, 0,-4,-1,-1,-4,-3, 2,-2, 0, 3,-3, 0, 0,_M 7 0, 1, 6, 0,-1, 0,-2, 2, 0,-4, 0}, 

1, 0, 0, 0, 0,-3, 1,-1,-1,0, 0,-3,-2, 1,_M, 1,-1, 0, 2, 1, 0,-1,-2, 0,-3, 0}, 

1, 0,-2, 0, 0,-3, 0,-1, 0, 0, 0,-1,-1, 0,_M, 0,-1,-1, 1, 3, 0, 0,-5, 0,-3, 0}, 

0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,_M, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, 

0,-2,-2,-2,-2,-1,-1,-2, 4, 0,-2, 2, 2,-2 ,_M,- 1,-2, -2,-1, 0, 0, 4,-6, 0,-2,-2}, 
-6,-5,-8,-7,-7, 0,-7,-3,-5, 0,-3,-2,-4,-4 ,_M, -6,-5, 2,-2,-5, 0,-6,17, 0, 0,-6}, 

0,' 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,_M, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, 
-3,-3, 0,-4,-4, 7,-5, 0,-1, 0,-4,-l,-2,-2,_M,-5,-4,-4,-3,-3, 0,-2, 0, 0,10,-4}, 

0, 1,-5, 2, 3,-5, 0, 2,-2, 0, 0,-2,-1, 1,_M, 0, 3, 0, 0, 0, 0,-2,-6, 0,-4, 4} 



M, 



M 
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/* 
*/ 



10 



15 



20 



#mclude <stdio.h> 






^include <ctype.h> 






^define MAXJMP 


16 


/* max jumps in a diag */ 


^define MAXGAP 


24 


/* don't continue to penalize gaps larger than this */ 


^define JMPS 


1024 


/* max jmps in an path */ 


^define MX 


4 


/* save if there's at least MX-1 bases since last jmp */ 


#define DMAT 


3 


/* value of matching bases */ 


#defme DMIS 


0 


/* penalty for mismatched bases */ 


#defme DINSO 


8 


/* penalty for a gap */ 


#defme DINS1 , 


1 


/* penalty per base */ 


^define PINSO 


8 


/* penalty for a gap */ 


#defme PINS1 


4 


/* penalty per residue */ 


struct jmp { 







short n[MAXJMP]; /* size of jmp (neg for dely) */ 

unsigned short xfMAXJMP]; /* base no. of jmp in seq x */ 

/* limits seq to 2" 16 -1 */ 







struct diag { 








25 




int 


score; 


/* score at last jmp */ 








long 


offset; 


/* offset of prev block */ 








short 


ijmp; 


/* current jmp index */ 








struct jmp jp; 


/* list of jmps */ 




30 


}: 












struct path { 












int 


spc; /* 


number of leading spaces */ 








short 


n[JMPS];/* size of jmp (gap) */ 








int 


x[JMPS];/* Ioc of jmp (last elem before gap) */ 




35 


}; 












char 




*ofile; 


/* output file name */ 






char 




*namex[2]; 


/* seq names: getseqsO */ 






char 




*prog; 


/* prog name for err msgs */ 




40 


char 




*seqx[2J; 


/* seqs: getseqsO */ 






int 




dmax; 


/* best diag: nwQ */ 






int 




dmaxO; 


/* final diag */ 






int 




dna; 


/* set if dna: main{) */ 






int 




endgaps; 


/* set if penalizing end gaps */ 




45 


int 




gapx, gapy; 


/* total gaps in seqs */ 






int 




lenO, lenl; 


/* seq lens */ 






int 




ngapx, ngapy; 


/* total size of gaps */ 






int 




smax; 


/* max score: nw() */ 






int 




*xbm; 


/* bitmap for matching */ 




50 


long 




offset; 


/* current offset in jmp file */ 






struct 


diag 


*dx; 


/* holds diagonals */ 






struct 


path 


PP[23; 


/* holds path for seqs */ 






char 




*calioc(), *mal]oc(), 


*index(), *strcpy(); 




55 


char 




*getseq(), *g_calloc(); 
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/* Needleman-Wunsch alignment program 

* usage: progs filel file2 

5 * where filel and file2 are two dna or two protein sequences. 

* The sequences can be in upper- or lower-case an may contain ambiguity 

* Any lines beginning with r ;\ ' > ' or ' < ' are ignored 

* Max file length is 65535 (limited by unsigned short x in the jmp struct) 

* A sequence with 1/3 or more of its elements ACGTU is assumed to be DNA 
10 * Output is in the file "align.out" 

* The program may create a tmp file in /tmp to hold info about traceback. 

* Original version developed under BSD 4.3 on a vax 8650 
*/ 

1 5 ^include "nw.h" 
^include "day.h" 

static _dbval[26] = { 

1,14,2,13,0,0,4,11,0,0,12,0,3,15,0,0,0,5,6,8,8,7,9,0,10,0 

20 }; 

static _pbval[26] = { 

1, 2j(l < <( , D , -'A , ))|(1< <{'N , -'A , », 4, 8, 16, 32, 64, 
128, 256, OxFFFFFFF, I < < 10, 1< < 11, 1 < < 12, 1 < < 13, 1 < <14, 
25 1<<15, 1<<16, 1<<17, 1<<18, 1<<19, 1< <20, 1<<21, 1< <22, 

1< <23, 1 < <24, 1< <25|(1< <('E'- , A , ))|(I < <('Q , - , A')) 



main(ac, av) 
30 int ac; 

char *av[|; 

{ 

prog - av[0]; 
if (ac != 3){ 

35 fprintf(stderr, "usage: %s filel file2\n\ prog); 

f^rintf(stderr, "where filel and file2 are two dna or two protein sequence s.\n"); 

fj)rintf(stderr, w The sequences can be in upper- or lower-case\n">; 

fj)rintf(stderr, M Any lines beginning with ';' or ' < ' are ignored\n"}; 

f]>rintf(stderr, M Output is in the file \ w align.out\ B \n"); 
40 exit(i); 
} 

namex[0] = av[l); 
namex[l] = av[2]; 
seqx[0] - getseq(namex[0}, &len0); 
45 seqx[l] = getseq(namex[l], &lenl); 

xbm = (dna)?_dbval : _pbval; 



50 



main 



endgaps = 0; /* 1 to penalize endgaps */ 

ofile - "align.out"; /* output file */ 

nw(); /* fiH in the matrix, get the possible jmps */ 

readjmps(); /* get the actual jmps */ 

printQ; /* P"nt stats, alignment */ 



55 cleanup(O); /* unlink any tmp files */ 
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/* do the alignment, return best score: mainO 

* dna: values in Fitch and Smith, PNAS, 80, 1382-1386, 1983 

* pro: PAM 250 values 

* When scores are equal, we prefer mismatches to any gap, prefer 

* a new gap to extending an ongoing gap, and prefer a gap in seqx 

* to a gap in seq y. 



*/ 






nw() 






{ 








char 


*px, *py; 




hit 


*ndely, *de!y; 




int 


ndelx, delx; 




int 


*tmp; 




int 


mis; 




int 


insO, insl; 




register 


id; 




register 


ij; 




register 


*coK), *coIl; 




register 


xx, yy; 



DW 



/* seqs and ptrs */ 
/* keep track of dely */ 
/* keep track of delx */ 
/* for swapping rowO, rowl */ 
] 5 i n t mis; /* score for each type */ 

/* insertion penalties */ 
/* diagonal index */ 
/* jmp index */ 
/* score for curr, last row */ 
20 register xx, yy; /* index into seqs */ 

dx = (struct diag *)g_caUoc("to get diags", lenO + lenl + 1 , sizeof (struct diag)); 

ndely = (int *)g_eailoc("to get ndely", lenl + 1, sizeof(int)); 
25 dely = (int *)g_calloc("to get dely", lenl + 1 , sizeof(int)); 

colO = (int *)g_calloc("to get colO", Ienl + 1, sizeof(int)); 
coll = (int *)g_cai!oc("to get coll" , lenl + 1 , sizeof (int)); 
insO = (dna)? DINSO : PINSO; 
insl - (dna)?DINSl : PINS1; 

30 

smax = -10000; 
if (endgaps) { 

for (colOEO] = dely[0] = -insG, yy = 1; yy < = lenl; yy+ +) { 
col0[yy] = dely[yy} = colOfyy-lJ - insl; 
35 ndely [yy] = yy; 

} 

co!0[0] = 0; /* Waterman Bull Math Biol 84 */ 

} 

else 

40 for (yy = 1; yy <= lenl; yy++) 

delylyy] = -insO; 

/* fill in match matrix 
*/ 

45 for (px - seqx[0], xx = 1; xx < = lenO; pxH- +, xx+ +) { 

/* initialize first entry in col 
*/ 

if (endgaps) { 

if (xx 1) 

50 coll[0] = delx = -(insO+insl); 

else 

coll[0] = delx = co!0[0] - insl; 
ndelx = xx; 



} 

55 else { 



60 



coll[0] = 0; 
delx = -insO; 
ndelx = 0; 



Page 2 of nw.c 



FIGURE 4E 

for(py = seqx[l], yy = Uyy < = lent; py+ +, yy+ +) { 
mis = co!0[yy-l]; 
5 if(dna) 

mis + = (xbm[*px- , A , ]&xbmE*py- , A'])? DMAT : DMIS; 

else 

mis + ^ _day[*px- , A , ][*py- , A t ]; 

10 /* update penalty for del in x seq; 

* favor new del over ongong del 

* ignore MAXGAP if weighting endgaps 
*/ 

if (endgaps | | ndely[yy] < MAXGAP) { 
15 if (col0[yy] - insO > = delylyy]) { 

delyjyy] — colOfyy] - (insO+insl); 
ndelyfyy] = 1; 

} else { 

delyfyyj -= insl; 

20 ndely[yy]++; 

} 

} else { 

if (col0[yy] - (insO+insl) > = dely[yy]) { 
delyEyy] = coi0[yy] - (insO + insl); 
25 ndely[yy] = 1; 

} else 

nde!y[yy] + + ; 

} 

30 /* update penalty for del in y seq; 

* favor new del over ongong del 
*/ 

if (endgaps j j ndelx < MAXGAP) { 

if <coll[yy-l] - insO > - delx) { 
35 delx = coll[yy-l] - (insO+insl); 

ndelx = 1; 

} else { 

delx -= insl; 
ndelx + + ; 

40 } 

} else { 

if (coIl[yy-lj - (insO+insl) > = delx) { 
delx = coll[yy-l] - (insO + insl); 
ndelx = 1; 

45 } else 

ndelx + + ; 

} 

/* pick the maximum score; we're favoring 
50 * mis over any del and delx over dely 

*/ 



55 



60 
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id = xx - yy + lenl - 1; 
if (mis > = delx && mis > = dely[yy}) 
5 collfyy] - mis; 

else if (delx > = dely[yy]) { 

co!l[yy] = delx; 

ij = dx[id].ijmp; 

if (dx[id].jp.n[0] && (!dna 1 1 (ndelx > - MAXJMP 
10 && xx > dx[id]Jp.x[ij]+MX) 1 1 mis > dx [id]. score +DINS0)) { 

dx[id].ijmp+ + ; 
if + >= MAXJMP) { 
writejmps(id); 
ij = dx[id].ijmp = 0; 

15 dx[id].offset = offset; 

offset + = sizeof(struct jmp) + sizeof(offset); 

} 

dx[id]Jp.n[ij] = ndelx; 
20 dx[id].jp.x[ij3 = xx; 



} 

else { 



dx[id]. score = delx; 



coll[yy3 = dely[yy]; 
25 ij = dx[id].ijmp; 

iff <dx[id].jp.n[0] && (!dna j | (ndely[yy] > = MAXJMP 

&& xx > dxfid] jp.x[ij]+MX) 1 1 mis > dx[id]. score +DINS0)) { 
dx[id]Jjmp++; 

30 if <++ij > = MAXJMP) { 

writejmps(id); 
ij ~ dx[id].ijmp = 0; 
dx[id]. offset = offset; 

offset + = sizeof(struct jmp) + sizeof(offset); 

35 } 

} 

dx[id] jp.n[ijj = -ndelylyyj; 
dx[id] jp.x[ij] = xx; 
dx[id], score = dely[yy]; 

40 } 

if (xx = = lenO && yy < lenl) { 
/* last col 
*/ 

if (endgaps) 

45 col 1 [yy] - = insO + ins 1 *(Ienl -yy); 

if (coll [yy] > smax) { 

smax = coll [yy]; 
dmax = id; 

} 

50 } 
} 

if (endgaps && xx < lenO) 

coll[yy-l]-= ins0+insl*(len0-xx); 
if (coll[yy-l] > smax) { 
55 smax = coll[yy-l]; 

dmax = id; 

> 

tmp = colO; colO = coll; coll = tmp; 

} 

60 (void) free((char *)ndely); 

(void) free((char *)dely); 

(void) free((char *)co]0);(void) free((char *)coll);} Page 4 of nW.C 



FIGURE 4G 

* print() — only routine visible outside this module 

5 

* static: 

* getmat() - trace back best path, count matches: print() 

* pr_align() ~ print alignment of described in array pQ: print() 

* dumpblockO - dump a block of lines with numbers, stars: pr_align0 
10 * numsO - put out a number line: dumpblockO 

* putline{) - put out a line (name, [num], seq, [num]): dumpblockO 

* starsO - -p«t a line of stars: dumpblockO 

* stripnameO ~ strip any path and prefix from a seqname 
*/ 

15 

^include "nw.h" 
#defme SPC 3 

#defme P_LINE 256 /* maximum output line */ 
20 #define PSPC 3 /* space between name or num and seq */ 

extern _day [26] [26] ; 

int olen; /* set output line length */ 



25 



FILE *fx; /* output file */ 



print() 
{ 



print 

int lx, ly, firstgap, lastgap; /* overlap */ 



30 if ((fx - fopen(ofile, "w")) = = 0) { 

fprintf(stderr,"%s: can't write %s\rT, prog, ofile); 
cleanups ); 

} 

fprintf(fx, tt < first sequence: %s (length = %d)\n", namex[0], lenO); 
35 f>rintf(fx, u < second sequence: %s (length = %d)\n\ namex[l], lenl); 

olen = 60; 
lx = lenO; 
iy = lenl; 

firstgap = iastgap = 0; 
40 if (dmax < lenl - 1) { /* leading gap in x */ 

pp[0].spc = firstgap = lenl - dmax - 1; 
Iy-= pp[0}.spc; 

} 

else if (dmax > lenl - 1) { /* leading gap in y */ 
45 PPH].spc = firstgap = dmax - (lenl - 1); 

lx -= pp[l].spc; 

} 

if (dmaxO < lenO - 1) { /* trailing gap in x */ 
Iastgap = lenO - dmaxO -1 ; 
50 lx -- lastgap; 

> 

else if (dmaxO > lenO -!){/* trailing gap in y */ 
lastgap = dmaxO - (lenO - 1); 
ly -= lastgap; 

55 } 

getmat(lx, ly, firstgap, lastgap); 
pr_alignO; 
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/* 

* trace back the best path, count matches 
*/ 

5 static 

getmat(!x, ly, firstgap, lastgap) getmat 
int Ix, ly; /* "core" (minus endgaps) */ 

int firstgap, lastgap; /* leading trailing overlap */ 

{ 

10 int ran, iO, il, sizO, sizl; 

char outx[32}; 
double pet; 
register nO, nl ; 

register char *p0, *pl; 

15 

/* get total matches, score 
*/ 

iO = il = sizO = sizl = 0; 
pO = seqx[0] + pp[l].spc; 
20 pi - seqx[l] + pp[0].spc; 

nO = pp[l].spc + 1; 
nl - pp[0].spc 4- 1; 

nm = 0; 

25 while ( *p0 && *pl ){ 

if (sizO) { 

pl + + ; 
nl + -f ; 
sizO-; 

30 } 

else if (sizl) { 

pO+ + ; 



45 



n0+ + ; 
sizl—; 



35 } 

else { 



if (xbm[*pO- f A'j&xbmPpl-'A']) 

nm+ + ; 
if (nO+4- == pp[0].x[iOj) 
40 sizO = pp[0}.n[iO-f +]; 

if (nl + + = = pp[l].x[il]) 

sizl = pp[l].n[il + +]; 

p0 + + ; 
pl + + ; 



} 



/* pet homology: 

* if penalizing endgaps, base is the shorter seq 
50 * else, knock off overhangs and take shorter core 

*/ 

if (endgaps) 

Ix = (lenO < lenl)? IenO : lenl; 

else 

55 Ix = (lx < ly)? Ix : ly; 

pet = 100.*<double)nm/(doubfe)lx; 
fprintf(fx, "\n"); 

fyrmtf(fx, " < %d match%s in an overlap of %d: %2f percent similarity\n" 
nm, (nm = = 1)? : "es", lx, pet); 

60 
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30 } 



35 



40 



fprintf(fX " < gaps in first sequence: %<T, gapx); 
if (gapx) { 

(void) sprintf(outx, " (%d %s%s) M , 

ngapx, (dna)? "base" ^residue", (ngapx ■ 

fprintf(fx/%s", outx); 



♦getmat 



1)? «": w s B ); 



fprintf(fx, gaps in second sequence: %d n , gapy); 
if (gapy) { 

(void) sprintf(outx, " (%d %s%s)", 

ngapy, (dna)? "base" ^residue", (ngapy ==1)? *":"s"); 
fprintf(fx, "%s", outx); 



} 

if (dna) 



else 



fprintf(fx, 

"\n< score: %d (match = %d, mismatch - 
smax, DMAT, DMIS, DINSO, DINS1); 



i penalty = %d + %d per base)\n" 



fprintf(fx, 

"\n< score: %d (Dayhoff PAM 250 matrix, gap penalty = %d + %d per residue)\n w 
smax, PINS0, PINS1); 
if (endgaps) 

fprintf(fx t 

"<endgaps penalized, left endgap: %d %s%s, right endgap: %d %s%s\n", 
firstgap, (dna)? "base" : "residue", (firstgap = = 1)? MM : tt s", 
lastgap, (dna)? "base" : "residue", (Iastgap =- 1)? : "s n ); 



else 



static 
static 
static 
static 
static 
static 
static char 
static char 
static char 
static char 



fprintf(fx, "< endgaps not penalized\n w ); 



ran; /* matches in core -- for checking */ 

Imax; /* lengths of stripped file names */ 

ij[2]; /* jmp index for a path */ 

nc[2]; /* number at start of current line */ 

ni[2]; /* current elem number - for gapping */ 

siz[2]; 

*ps[2]; /* ptr to current element */ 

*po[2}; /* ptr to next output char slot */ 

out[2][PJLINE]; /* output line */ 

star[PJJNE]; /* set by starsQ */ 



/* 

* print alignment of described in struct path ppQ 

45 */ 

static 

pr_align() 

{ 

int nn; /* char count */ 

50 int more; 

register i; 



pralign 



55 



for (i = 0, lmax = 0; i < 2; i + +) { 
nn = stripname(namex[i]); 
if (nn > lmax) 

Imax = nn; 



60 



nc[i] = 1; 
ni[i] - 1; 
siz[i] = — 0; 
ps[i] = seqx[i]; 
po[i] = out{i]; 
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for (nn = nm = 0, more = 1; more; ) { 

for (i = more = 0; i < 2; i+ +) { 
/* 

* do we have more of this sequence? 
*/ 

if(!*ps[ij) 

continue; 

more -I- +; 

if (pp[i].spc) { /* leading space */ 
*po[i] + + = 1 
pp[i].spc~; 

} 

else if (siz[i]) { /* in a gap */ 
*po[i] + + ~ 
siz[i]~; 



.pr align 



} 

else { 



/* we're putting a seq element 
*/ 

*po[i] = *ps[i}; 
if (islower(*ps[i]» 

*ps[i] = toupper(*ps[i]); 

po[i]++; 
ps[i]++; 



30 



35 



40 



45 



50 



} 



/* 

* are we at next gap for this seq? 
*/ 

if(ni[i]--pp[i].x[ij[i]]){ 
/* 

* we need to merge all gaps 

* at this location 
*/ 

siz[i] = pp[i].n[ij[i]++]; 
while (ni[ij == pp[i].x[ij[i]j) 

siz[ij +-pp[i].n[ij[i] + +J; 



} 

ni[ij + + ; 



} 

if (+ + nn = = olen | ( Imore && nn) { 
dumpblockO; 
for (i = 0; i < 2; i++) 
po[i) = out[i]; 

nn = 0; 

} 



55 
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/* 

* dump a block of lines, including numbers, stars: pr_align() 
*/ 

static 

dumpblockO 

{ 

register i; 

for (i = 0; i < 2; 

*po[i]~ - '\0 T ; 



dumpblock 
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...dumpblock 

(void) putc('\n\ fx); 
5 for (i - 0; i < 2; i++){ 

if (*out[ij && (*out[i] ! = ' ' 1 1 *(po[i}) f=")){ 
if(i==0) 

nums(i); 
if (i == 0&& *out[l]) 
10 stars(); 

putline(i); 

if (i = = 0 && *out[l]) 

fprintf(fx, star); 
if(i«l) 

1 5 nums(i); 
} 

} 

} 

20 /* 

* put out a number line: dumpbtockO 
*/ 

static 

nums(ix) 

25 int ix; /* index in outQ holding seq line */ 



30 



55 



nums 



{ 



char nIine[P_LINEJ; 

register i, j; 

register char *pn, *px, *py; 



for (pn = nline, i = 0; i < Imax+PSPC; + , pn+ +) 
*pn = ' '; 

for (i = nc[ix], py = out[ixj; *py; py+ + , pn+ +) { 
if (*py " * ' [| *py = = r -') 
35 *pn = ' 

else { 

if(i%10«0 || (i== 1&& nc[ix]!= 1)) { 
j = (i < 0)? -i : i; 
for (px = pn; j; j /= 10, px~) 
40 *px = j%10 + '0'; 

if a <o> 

*px = 

} 

else 

45 *P n = ' 

i + + ; 

} 

} 

*pn - '\0'; 
50 nc[ix] = i; 

for (pn - nline; *pn; pn+ 4-) 
(void) putc(*pn, fit); 
(void) putc('\n\ fx); 



} 

/* 



* put out a line (name, [num], seq, [num]): dumpblockO 
*/ 

static 

60 putline(ix) , P ut,ine 

int ix; * 
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int i; 
5 register char *px; 

for (px = namex[ix], i = 0; *px && *px != V; px+ + , i++) 

(void) putc(*px, fx); 
for (; i < Imax+P_SPC; i+ +) 
10 (void)putcC \ fx); 

/* these count from 1: 

* niQ is current element (from 1) 

* nc[] is number at start of current line 

15 */ 

for (px = out[ix]; *px; px + +) 

(void) putc(*px&0x7F, fx); 
(void) putc('\n\ fx); 



20 



30 



40 



/* 

* put a line of stars (seqs always in out[0], out[l]): dumpblockO 
*/ 



25 static 

stars() 



{ 



int i; 

register char *p0, *pl, cx, *px; 



if (!*out[0] 1 1 (*out[0] = = 1 ' && *(po[0]) - - ' r ) 1 1 
!*out[l] 1 1 (*out[l] == ' ' && *(po[l]) == ")) 
return; 
px ~ star; 

35 for (i = Imax+PJSPC; i; i-> 

*px+ + = * '; 



for (pO = out[0] t pi - out[l]; *p0 && *pl; p0+ +, pi + +) { 
if (isalpha(*pO) && isalpha(*pl)) { 



if (xbroPpO-'A'l&xbmPpl-'A']) { 

cx — 
nm+ + ; 

> 

45 else if (!dna && jiay^pO-'A'l^pi-'A'] > 0) 

cx = V; 

else 



.putline 



stars 



cx : 



} 

50 else 

cx = 
*px+ + — cx; 

} 

*px++ = '\n'; 
55 *px = '\0'; 



60 
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/* 

* strip path or prefix from pn, return len: pr alignO 
*/ 

5 static 

stripmme(pn) stripname 
char *pn; /* file name (may be path) */ 

{ 

register char *px, *py; 

10 

py = 0; 

for (px - pn; *px; px+ +) 
if (*px = = V) 

py - px + 1; 

15 if (py) 

(yoid) strcpy(pn, py); 
return(str!en(pn)); 

> 

20 



25 



30 
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50 



55 



60 



Page 7 of nwprint.c 



FIGURE 4N 



/* 

* cleamipO - cleanup any tmp file 

* getseqO - read in seq, set dna, len, maxlen 
5 * g_calloc() -- callocO with error checkin 

* readjmpsO -- get the good jmps, from tmp file if necessary 

* writejmpsO — write a filled array of jmps to a tmp file: nw() 
*/ 

include "nw.h" 
10 include <sys/file.h> 

char *jname = Vtmp/homgXXXXXX 7 '; /* tmp file for jmps */ 

FILE *fj; 

15 int cleanup!); /* cleanup tmp file */ 

long Iseek(); 

/* 

* remove any tmp file if we blow 
20 */ 

cleanup) CleaDU P 
int i; 

{ 

if(fj) 

25 (void) unlink(jname); 

exit(i); 

} 
/* 

30 * read, return ptr to seq, set dna, len, maxlen 

* skip lines starting with ';',*<', or ' > ' 

* seq in upper or lower case 
*/ 

char * 

35 getseq(flle, len) getseq 



{ 



char *file; /* file name */ 
int *Ien; /* seq len */ 



char Iine[1024], *pseq; 

40 register char *px, *py; 

int natgc, tlen; 

FILE *fp; 

if ((fp = fopen(file^r")) = = 0) { 
45 fprintf(stderr, rt %s: can't read %s\n", prog, file); 

exit(l); 

} 

tlen = natgc = 0; 
while (fgets(line, 1024, fp)) { 
50 if (*line == V || *line = = ' < ' 1 1 *line = ='>') 

continue; 
for (px = line; *px != *\n ? ; px+ +) 

if (isupper(*px) 1 1 is!ower(*px)) 
tlen+ + ; 

55 } 

if ((pseq = maIIoc((unsigned)(t!en+6))) -= 0) { 

fprintf(stderr,"%s: mallocO failed to get %d bytes for %sW\ prog, tIen+6, file); 
exit(l); 

} 

60 pseq[0] - pseq[l] = pseq[2] = pseq[3] = '\(T; 
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...getseq 



py = pseq + 4; 
*len = tlen; 
5 rewind(fp); 

while (fgets(Iine, 1024, fp)) { 

if (*line « Y \ \ *line || *Iine « '>*) 

continue; 

1 0 for (px = line; *px ! = '\n'; px+ +) { 

if (isupper(*px)) 

*py+ + = *px; 
else if (islower(*px)) 

*py + + = toupper(*px); 
1 5 if (index("ATGCU w ,*(py-l))> 

natgc + +; 

} 

} 

*py+ + = *\0'; 
20 *py - '\0'; 

(void) fdose(fp); 

dna = natgc > (tIen/3); 

return(pseq+4); 



30 { 



} 

char * 

g_calloc(msg, nx, sz) g_call0C 
char *msg; /* program, calling routine */ 

int nx, sz; /* number and size of elements */ 



char *px, *calloc(); 



if ((px - canoc((unsigned)nx, (unsigned)sz)) = = 0) { 
if (*msg) { 

35 fcrintf(stderr, "%s: g_ca!locQ failed %s (n= %d, sz= %d)\n", prog, msg, nx, sz); 

exit(l); 

} 

} 

return(px); 

40 } 

/* 

* get final jmps from dxQ or tmp file, set ppQ, reset dmax: mainO 
*/ 

45 readjmps() readjmps 

{ 

int fd = -1; 

int siz, iO, il; 

register i, j, xx; 

50 

if(fj){ 

(void) fclose(fj); 

if ((fd = open(jname, OJRDONLY, 0)) < 0) { 

fprintf(stderr, "%s: can't openO %s\n", prog, jname); 
55 cleanup(l); 

} 

} 

for (i — iO = il = 0, dmaxO = dmax, xx = lenO; ; i+ +) { 
while (1){ 

60 for G = dx[dmax].ijmp; j > = 0 && dx[dmaxj.jp.x[j] > = xx; j-) 

; , Page 2 of nwsubr.c 



FIGURE 4P 

...readjmps 

if (J < 0 && dx[dmax].offset && fj) { 

(void) lseek(fd, dx[dmax]. offset, 0); 

(void) read(fd, (char *)&dx[dmax3jp, sizeof(struct jmp»; 

(void) read(fd, (char *)&dx[dmax], offset, sizeof(dx[dmax]. offset)); 

dx[dmax].ijmp = MAXJMP-1; 

} 

else 

break; 

} 

if (i > - JMPS) { 

fprintf(stden\ "%s: too many gaps in alignment^*, prog); 
cleanup(l); 

} 

if 0 >= Q){ 

siz - dx[dmax].jp.n[j]; 
xx = dx[dmax].jp.x[j]; 
dmax + = siz; 

if (siz < 0) { /* gap in second seq */ 

pp[l].n[ilj = -siz; 
xx + = siz; 

/* id = xx - yy + lenl - 1 
*/ 

pp[l].x[il] = xx - dmax + lenl - 1; 
gapy+ + ; 
ngapy -- siz; 
ignore MAXGAP when doing endgaps */ 

siz = (-siz < MAXGAP 1 1 endgaps)? -siz : MAXGAP; 
il + +; 

} 

else if (siz > 0) { /* gap in first seq */ 
pp[0].n[iO] = siz; 
pp[0].x[i0] = xx; 
gapx-f + ; 
ngapx + = siz; 
ignore MAXGAP when doing endgaps */ 

siz - (siz < MAXGAP 1 1 endgaps)? siz : MAXGAP; 
K)+ + ; 

} 

} 

eise 

break; 

> 

/* reverse the order of jmps 
*/ 

for (j = 0, i0-;j < i0;j + +, i0-) { 

i - pp[0].nD]; pp[0].nD3 - pp[0].n[i0]; pp[0].n[i0] - i; 
i = pp[0j.x03; PP[0]-xD] - pp[0].x[i0]; pp[0].x[i0] = i; 

} 

forfl = 0,il--;j < 

i = pp[13-n[j]; PP[i]-nOJ = PpEl].n[il];pp[l].n[il] = i; 
i = pp[l].xDl; PP[l}-x[)3 = prfl].x[il]; pp[l].x[il] - i; 

} 

if(fd> = 0) 

(void) close(fd); 

if(5){ 

(void) unlink(jname); 
8=0; 

offset = 0;}} ■ * Page 3 of nwsubr.c 



FIGURE 4Q 



/* 

* write a filled jmp struct offset of the prev one (if any): nw() 
5 */ 

writejmps(ix) writejmps 
int ix; 

{ 

char *mktemp0; 

10 

if CO) { 

if (mktemp(jnarae) < 0) { 

fprintf(stden\ "%s: can't mktempO %s\n w , prog, jname); 
cleamip(l); 

15 } 

if ((fj = fopen(jname, "w")) - - 0) { 

iprintf(stderr, *%s: can't write %s\n", prog, jname); 
exit(l); 

} 

20 } 

(void) fwrite((char *)&dx[ix] jp, sizeof(struct jmp), 1 , fj); 
(void) fwrite{(char *)&dx[ix].offset, sizeof(dx[ix}. offset), 1, fj); 

} 

25 
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GTGCTCTCCGAGGACAAGCAGGAGGNGGTGGAGCTGGTGAAGCACCATCTGTGGGCTCTG 

GAAGTGTGCTACATCTCAGCCTTGGTCTTGTCCTGCTTACTCACCTTCCTGGTCCTGATG 

CGCTCACTGGTGACACACAGGACCAACCTTCGAGCTCTGCACCGAGGAGCTGCCCTGGAC 

TrGAGTCCCTTGCATCGGAGTCCCCATCCCTCCCGCCAAGCCATATTCTGTTGGATGAGC 

TrCAGTGCCTACCAGACAGCCTTTATCTGCCTTGGGCTCCTGGTGCAGCAGATCATCTTC 

TrCCTGGGAACCACGGCCCTGGCCTTCCTGGTGCTCATGCCTGTGCTCCATGGCAGGAAC 

CTCCTGCTCTTCCGTTCCCTGGAGTCCTCGTGGCCCTTCTGGCTGACTTTGGCCCTGGCT 

GTGATCCTGCAGAACATGGCAGCCCATTGGGTCTTCCTGGAGACTCATGATGGACACCCA 

CAGCTGACCAACCGGCGAGTGCTCTATGCAGCCACCTTTCTTCTCTTCCCCCTCAATGTG 

CTGGTGGGTGCCATGGTGGCCACCTGGCGAGTGCTCCTCTCTGCCCTCTACAACGCCATC 

CACCTTGGCCAGATGGACCTCAGCCTGCTGCCACCGAGAGCCGCCACTCTCGACCCCGGC 

TACTACACGTACCGAA 



FIGURE 6 



5 CACAACCAGCCACCCCTCTAGGATCCCAGCCCAGCTGGTGCTGGGCTCAGAGGAGAAGGC 
CCCGTGTTGGGAGCACCCTGCTTGCCTGGAGGGACAAGTTTCCGGGAGAGATCAATAAAG 
GAAAGG AAAGAGACAAG G AAG GG AG AGGT CAGGAGAG CGCTTGATTGG AGGAGAAGGGC C 
AG AGA ATG TCGT C CC AGC CAGCAGGG AAC C AGACCT CC C CCGGGGC CACAG AGG AC TACT 
CCTATGGCAGCTGGTACATCGATGAGCCCCAGGGGGGCGAGGAGCTCCAGCCAGAGGGGG 

10 AAGTGCCCTCCTGCCACACCAGCATACCACCCGGCCTGTACCACGCCTGCCTGGCCTCGC 
TGTCAATCCTTGTGCTGCTGCTCCTGGCCATGCTGGTGAGGCGCCGCCAGCTCTGGCCTG 
ACTGTGTGCGTGGCAGGCCCGGCCTGCCCAGGCCCCGGGCAGTGCCTGCTGCTGTTTTCA 
TGGTCCTCCTGAGCTCCCTGTGTTTGCTGCTCCCCGACGAGGACGCATTGCCCTTCCTGA 
CTCTCGCCTCAGCACCCAGCCAAGATGGGAAAACTGAGGCTCCAAGAGGGGCCTGGAAGA 

15 TACTGGGACTGTTCTATTATGCTGCCCTCTACTACCCTCTGGCTGCCTGTGCCACGGCTG 
GCCACACAGCTGCACACCTGCTCGGCAGCACGCTGTCCTGGGCCCACCTTGGGGTCCAGG 
TCTGGCAGAGGGCAGAGTGTCCCCAGGTGCCCAAGATCTACAAGTACTACTCCCTGCTGG 
CCTCCCTGCCTCTCCTGCTGGGCCTCGGATTCCTGAGCCTTTGGTACCCTGTGCAGCTGG 
TGAG AAG CTT CAG CC GTAGG AC AGG AG CAGG CT C CAAGGGG C TGCAGAG C AG CTAC TCTG 

20 AGG AATATC TGAGGAAC C TC CTTTGC AGGAAGAAG C TGGGAAGC AG CTACCACAC C TCCA 
AGCATGGCTTCCTGTCCTGGGCCCGCGTCTGCTTGAGACACTGCATCTACACTCCACAGC 
CAGGATTCCATCTCCCGCTGAAGCTGGTGCTTTCAGCTACACTGACAGGGACGGCCATTT 
ACCAGGTGGCCCTGCTGCTGCTGGTGGGCGTGGTACCCACTATCCAGAAGGTGAGGGCAG 
GGGTCACCACGGATGTCTCCTACCTGCTGGCCGGCTTTGGAATCGTGCTCTCCGAGGACA 

25 AG C AGG AGGTGGTG G AG CTGGTG AAG C AC C ATCTGTGGG C T CTGG AAGTGTGCTAC ATC T 
CAGCCTTGGTCTTGTCCTGCTTACTCACCTTCCTGGTCCTGATGCGCTCACTGGTGACAC 
AC AG G AC CAAC CTTCGAGCTCTGC AC CG AGG AGCTGCCCTGGACTTGAGTCCCTTG CATC 
GGAGTCCCCATCCCTCCCGCCAAGCCATATTCTGTTGGATGAGCTTCAGTGCCTACCAGA 
CAG CCTTTATCTGCCTTGGGCTCCTGGTGCAGCAGATCATCTTCTTC CTGG G AAC CACGG 

30 CCCTGGCCTTCCTGGTGCTCATGCCTGTGCTCCATGGCAGGAACCTCCTGCTCTTCCGTT 
CCCTGGAGTCCTCGTGGCCCTTCTGGCTGACTTTGGCCCTGGCTGTGATCCTGCAGAACA 
TGGCAGCCCATTGGGTCTTCCTGGAGACTCATGATGGACACCCACAGCTGACCAACCGGC 
GAGTGCTCTATGCAGCCACCTTTCTTCTCTTCCCCCTCAATGTGCTGGTGGGTGCCATAG 
TGGCCACCTGGCGAGTGCTCCTCTCTGCCCTCTACAACGCCATCCACCTTGGCCAGATGG 

35 ACCTCAGCCTGCTGCCACCGAGAGCCGCCACTCTCGACCCCGGCTACTACACGTACCGAA 
ACTTCTTGAAGATTGAAGTCAGCCAGTCGCATCCAGCCATGACAGCCTTCTGCTCCCTGC 
TCCTGCAAGCGCAGAGCCTCCTACCCAGGACCATGGCAGCCCCCCAGGACAGCCTCAGAC 
CAGGGGAGGAAGACGAAGGGATGCAGCTGCTACAGACAAAGGACTCCATGGCCAAGGGAG 
CTAGGCCCGGGGCCAGCCGCGGCAGGGCTCGCTGGGGTCTGGCCTACACGCTGCTGCACA 

40 ACCCAACCCTGCAGGTCTTCCGCAAGACGGCCCTGTTGGGTGCCAATGGTGCCCAGCCCT 
GAGGGCAGGGAAGGTCAACCCACCTGCCCATCTGTGCTGAGGCATGTTCCTGCCTACCAC 
CTCCTCCCTCCCCGGCTCTCCTCCCAGCATCACACCAGCCATGCAGCCAGCAGGTCCTCC 
GGATCACTGTGGTTGGGTGGAGGTCTGTCTGCACTGGGAGCCTCAGGAGGGCTCTGCTCC 
ACCCACTTGGCTATGGGAGAGCCAGCAGGGGTTCTGGAGAAAGAAACTGGTGGGTTAGGG 

45 CCTTGGTCCAGGAGCCAGTTGAGCCAGGGCAGCCACATCCAGGCGTCTCCCTACCCTGGC 
TCTGCCATCAGCCTTGAAGGGCCTCGATGAAGCCTTCTCTGGAACCACTCCAGCCCAGCT 
CCACCTCAGCCTTGGCCTTCACGCTGTGGAAGCAGCCAAGGCACTTCCTCACCCCCTCAG 
CGCCACGGACCTCTCTGGGGAGTGGCCGGAAAGCTCCCGGGCCTCTGGCCTGCAGGGCAG 
CCCAAGTCATGACTCAGACCAGGTCCCACACTGAGCTGCCCACACTCGAGAGCCAGATAT 

5 0 TTTTGTAGTTTTTATGCCTTTGGCTATTATGAAAGAGGTTAGTGTGTTCCCTGCAATAAA 
CTTGTTCCTGAGAAAAA 



55 



FIGURE 7 

5 

MSSQPAGNQTSPGATEDYSYGSWYIDEPQGGEELQPEGEVPSCHTSIPPGLYHACLASL 
SILVLLLLAMLTORRQLWPDCVRGRPGLPRPRAVPAAVFMVLLSSLCLLLPDEDALPFL 
TLASAPSQDGKTEAPRGAWKILGLFYYAALYYPLAACATAGHTAAHLLGSTLSWAHLGV 
QVWQRAECPQVPKIYKYYSLLASLPLLLGLGFLSLWYPVQLVRSFSRRTGAGSKGLQSS 

1 0 YSEEYLRNLLCRKKLGSSYHTSKHGFLSWARVCLRHCIYTPQPGFHLPLKLVliSATLTG 
TAIYQVALLLLVGWPTIQKVRAGWTDVSYLLAGFGIVLSEDKQEVVELVKHHLWALE 
VCYISALVLSCLLTFLVLMRSLVTHRTNLRALHRGAALDLSPLHRSPHPSRQAIFCWMS 
FSAYQTAFICLGLLVQQI I FFLGTTALAFLVLMPVLHGRNLLLFRSLESSWPFWLTLAL 
AVILQNMAAHWFLETHDGHPQLTORRVLYAATFLLFPLNVIiVGAIVATWRVLLSALYH 

15 AIHLGQMDLSLLPPRAATLDPGYYTYRNFLKIEVSQSHPAMTAFCSLLLQAQSLLPRTM 
AAPQDSLRPGEEDEGMQLLQTKDSMAKGARPGASRGRARWGLAYTLLHNPTLQVFRKTA 
LLGANGAQP 

Important features of the protein: 
20 Signal peptide: 
none 

Transmembrane domain: 

25 54-71 

93-111 

140-157 

197-214 

291-312 
30 356-371 

425-444 

464-481 

505-522 

35 Motif name: N-glycosylation site. 
8-12 

Motif name: N-myristoylation site. 

40 

50-56 

167-173 

232-238 

308-314 
45 332-338 

516-522 

618-624 

622-628 

631-637 
50 652-658 

Motif name: Prokaryotic membrane lipoprotein lipid attachment site. 
355-366 
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Motif name: ATP/ GTP -binding site motif A {P-loop) . 
123-131 
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FIGURE 10 
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FIGURE 12B 
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FIGUKE 16 




FIGURE 
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FIGURE 18 




